This ‘Intro to R Lesson Series’ is brought to you by the Centre for the Analysis of Genome Evolution & Function’s (CAGEF) bioinformatics training initiative. This course was developed based on feedback on the needs and interests of the Department of Cell & Systems Biology and the Department of Ecology and Evolutionary Biology.
This lesson is the third in a 6-part series. The idea is that at the end of the series, you will be able to import and manipulate your data, make exploratory plots, perform some basic statistical tests, test a regression model, and make some even prettier plots and documents to share your results.
How do we get there? Today we are going to be learning how to make all sorts of plots - from simple data exploration to interactive plots.The next lesson will be data cleaning and string manipulation; this is really the battleground of coding - getting your data into the format where you can analyse it. Then we will learn how to do t-tests and perfrom regression and modeling in R. And lastly, we will learn to write some functions, which really can save you time and help scale up your analyses.
The structure of the class is a code-along style. It is hands on. The lecture AND code we are going through are available on GitHub for download at https://github.com/eacton/CAGEF (Note: repo is private until approved), so you can spend the time coding and not taking notes. As we go along, there will be some challenge questions and multiple choice questions on Socrative. At the end of the class if you could please fill out a post-lesson survey (https://www.surveymonkey.com/r/VNQZ3KS), it will help me further develop this course and would be greatly appreciated.
Objective: At the end of this session you will be able to use ggplot2 to make a ton of different types of plots with your data for both for data exploration and for publication-quality figures.
(Figure borrowed from Paul Hiemstra)
Grammar of graphics: a language to be able to communicate about what we are plotting programatically
The aesthetics are what you can see. For example, the data being plotted or represented by a shape or colour. This data could be presented in multiple ways. Data is mapped to aesthetics. A plot can have multiple layers (for example, a scatterplot with a regression line).
Lines, bars, and points are examples of geometric objects (geoms) that could be used to present the data.
The data units need to be converted to physical units in order to be displayed. This uses scaling and a coordinate system (position adjustment for where is our pixel going). Other statistical transformations can also occur, such as aggregating data, jittering, density estimates, a boxplot and binning.
Facetting allows us to plot subsets of the data.
This grammar allows very specific control over your plot and the ability to change features (easily). Plots can be scaled and made pretty enough for publication quality images.
ggplot2 was made to interact well with tidy (long) datasets. I have found that if someone is spending lots of time figuring out how to make a scatterplot, his or her data is probably not in the correct format.
Metagenomic 16SrRNA sequencing of latrines from Tanzania and Vietnam at different depths (multiples of 20cm).
We have 2 csv files (change one to tsv or xlsx - maybe both and make an additional files and get a google spreadsheet): 1. A metadata file (Naming conventions: [Country_LatrineNo_Depth]) with sample names and environmental variables. 2. A table of species abundance.
B Torondel, JHJ Ensink, O Gunvirusdu, UZ Ijaz, J Parkhill, F Abdelahi, V-A Nguyen, S Sudgen, W Gibson, AW Walker, and C Quince. Assessment of the influence of intrinsic environmental and geographical factors on the bacterial ecology of pit latrines Microbial Biotechnology, 9(2):209-223, 2016. DOI:10.1111/1751-7915.12334
I have saved a version of the OTU table in tidy format (which we created in Lesson 2 - split_dat). As well as a modified version of the metadata table.
Let’s read our data tables in and also load ggplot2. We will be using a variety of packages today that are grouped in tidyverse, so let’s load tidyverse. You can see in the package startup message that ggplot2 is one of the attached packages. You will also see in the ‘Conflicts’ section that dplyr::filter() and dplyr::lag() mask functions that are named the same thing from the stats package. This just means that if you call filter() it will by default be from dplyr. You can still call the masked function by explicitly using the version from the masked package, stats::filter().
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1 ✔ purrr 0.2.4
## ✔ tibble 1.4.2 ✔ dplyr 0.7.4
## ✔ tidyr 0.8.0 ✔ stringr 1.3.0
## ✔ readr 1.1.1 ✔ forcats 0.3.0
## ── Conflicts ───────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
dat <- read.csv("data/long_SPE_pitlatrine.csv", stringsAsFactors = F, header = TRUE)
ndat <- read.csv("data/split_ENV_pitlatrine.csv")
Let’s build a plot by adding components one by one to see how the grammar of graphics is implemented in ggplot2. To start, we of course need to input our data. However, if that is all we input, what we get back is a blank graphic. We have not yet said what we want to plot and how we want to plot it.
ggplot(ndat)
We have, however, created a ggplot object. This is a list of 9 parameters: data, layers, scales, mapping, theme, coordinates, facet, plot environment, and labels. Luckily there are some defaults, so we don’t have to specify everything, but you can start to see how ggplot objects are highly customizable.
str(ggplot(ndat))
## List of 9
## $ data :'data.frame': 81 obs. of 14 variables:
## ..$ Country : Factor w/ 2 levels "Tanzania","Vietnam": 1 1 1 1 1 1 1 1 1 1 ...
## ..$ Latrine_Number: int [1:81] 2 2 2 2 2 2 2 2 3 3 ...
## ..$ Depth : chr [1:81] "1" "10" "12" "2" ...
## ..$ pH : num [1:81] 7.82 9.08 8.84 6.49 6.46 7.69 7.48 7.6 7.55 7.68 ...
## ..$ Temp : num [1:81] 25.1 24.2 25.1 29.6 27.9 28.7 29.8 25 28.8 28.9 ...
## ..$ TS : num [1:81] 14.5 37.8 71.1 13.9 29.4 ...
## ..$ VS : num [1:81] 71.33 31.52 5.94 64.93 26.85 ...
## ..$ VFA : num [1:81] 71 2 1 3.7 27.5 1.5 1.1 1.1 30.9 24.2 ...
## ..$ CODt : int [1:81] 874 102 35 389 161 57 107 62 384 372 ...
## ..$ CODs : int [1:81] 311 9 4 180 35 3 9 8 57 57 ...
## ..$ perCODsbyt : int [1:81] 36 9 10 46 22 6 8 13 15 15 ...
## ..$ NH4 : num [1:81] 3.3 1.2 0.5 6.2 2.4 0.8 0.7 0.9 21.6 20.4 ...
## ..$ Prot : num [1:81] 35.4 18.4 0 29.3 19.4 0 14.1 7.6 33.1 44.5 ...
## ..$ Carbo : num [1:81] 22 43 17 25 31 14 28 28 47 48 ...
## $ layers : list()
## $ scales :Classes 'ScalesList', 'ggproto' <ggproto object: Class ScalesList>
## add: function
## clone: function
## find: function
## get_scales: function
## has_scale: function
## input: function
## n: function
## non_position_scales: function
## scales: NULL
## super: <ggproto object: Class ScalesList>
## $ mapping : list()
## $ theme : list()
## $ coordinates:Classes 'CoordCartesian', 'Coord', 'ggproto' <ggproto object: Class CoordCartesian, Coord>
## aspect: function
## distance: function
## expand: TRUE
## is_linear: function
## labels: function
## limits: list
## range: function
## render_axis_h: function
## render_axis_v: function
## render_bg: function
## render_fg: function
## train: function
## transform: function
## super: <ggproto object: Class CoordCartesian, Coord>
## $ facet :Classes 'FacetNull', 'Facet', 'ggproto' <ggproto object: Class FacetNull, Facet>
## compute_layout: function
## draw_back: function
## draw_front: function
## draw_labels: function
## draw_panels: function
## finish_data: function
## init_scales: function
## map: function
## map_data: function
## params: list
## render_back: function
## render_front: function
## render_panels: function
## setup_data: function
## setup_params: function
## shrink: TRUE
## train: function
## train_positions: function
## train_scales: function
## vars: function
## super: <ggproto object: Class FacetNull, Facet>
## $ plot_env :<environment: R_GlobalEnv>
## $ labels : list()
## - attr(*, "class")= chr [1:2] "gg" "ggplot"
The next step is to choose the data we are plotting (aesthetics). At this point the data can be scaled and the axes appear. We have not yet specified how we want the data plot, just which data should be plotted. In practice, people usually omit ‘mapping =’, but it is a good reminder that mapping is, in fact, what we are doing.
ggplot(ndat, mapping = aes(x = TS, y = CODt))
We now have chosen the geometric object (geom) with which to plot our data, in this case a point. A geom could be a line, a bar, a boxplot - you can type geom_ and then Tab to see all of the available options. Autocomplete can also be helpful for remembering syntax.
ggplot(ndat, aes(x=TS, y=CODt)) + geom_point()
The data looks like there are two groupings. My guess would be that this is for the 2 different countries. We can easily test this by colouring our points by Country. Look at the structure of ndat in either the Global Environment or using str(). Note that Country is a factor. A colour will be chosen for each factor level. A legend will be automatically created for you.
The aesthetic of colour which is mapped to Country can be specified when we are selecting the data to be plotted, or, as it describes the colour of our points, can be specified in geom_point(). I usually do the former, because it is more likely that I will change how I plot a figure (points, lines, bars) versus what I plot in a figure (TS, CODt, Country). There are minor implications for each of these choices.
ggplot(ndat, aes(x=TS, y=CODt, colour = Country)) + geom_point()
#is equivalent-ish to
ggplot(ndat, aes(x=TS, y=CODt)) + geom_point(aes(colour = Country))
Some of our data points seem to be crushed near the x-axis. We can scale the y-axis to fix this. When we start customizing our plot, our code starts to get a bit harder to read on one line. We can create each specification on a new line by ending each line with +.
ggplot(ndat, aes(x=TS, y=CODt, colour = Country)) +
geom_point(alpha = 0.3) +
scale_y_log10()
Faceting allows us to split our data into groups. Note that I have removed the colour. It is good data visualization practice to only have one attribute (colour, shading, faceting, symbols) per grouping.
ggplot(ndat, aes(x=TS, y=CODt)) +
geom_point() +
scale_y_log10() +
facet_grid(~Country)
I could now add information from another variable as a colour in this plot. Note that if a variable is continuous instead of discrete, the colour will be a gradient.
ggplot(ndat, aes(x=TS, y=CODt, colour = Temp)) +
geom_point() +
scale_y_log10() +
facet_grid(~Country)
If we really wanted to we could add another variable to the plot by changing the shape attribute. Let’s change Country to having 2 shapes and facet by the Depth of the latrine instead. Note that shape can only be used for discrete values. A quick reference key for shapes can be found in the ‘Cookbook for R’ (http://www.cookbook-r.com/Graphs/Shapes_and_line_types/).
ggplot(ndat, aes(x=TS, y=CODt, colour = Temp, shape = Country)) +
geom_point() +
scale_y_log10() +
facet_wrap(~Depth, scales = "free_y")
We can now note that only Tanzania had sampling greater than 4cm of depth. There are single latrines for 4 samples. There was no latrine at a depth of 11cm. Lack of replication and a bias towards Tanzania for the higher depths is something we should keep in mind while looking at this data. Depending on the question we are trying to answer, we may want to remove some of this data.
We can also note that our numbers are being ordered as if they were characters. In the last lesson we created Depth by splitting a sample name. We can change Depth to numeric type, and the numbers will be ordered properly.
ndat$Depth <- as.numeric(ndat$Depth)
ggplot(ndat, aes(x=TS, y=CODt, colour = Temp, shape = Country)) +
geom_point() +
scale_y_log10() +
facet_wrap(~Depth)
One thing that is not necessary in this case - but good to know about - is the ability to allow each grid to have its own independent axis scale. In our example, wells of Depths from 1-4cm have up to 1000 CODt, while the other wells barely have values past 100 CODt. This can be changed, but keep in mind most people will assume all grids have the same scale, so take extra care to point that the scales are different when presenting or publishing.
ggplot(ndat, aes(x=TS, y=CODt, colour = Temp, shape = Country)) +
geom_point() +
scale_y_log10() +
facet_wrap(~Depth, scales = "free_y")
You can also add statistical transformations to your plots. Again, take a look at stat_ then Tab to see the list of options. In this case let’s separately fit a linear regression line to CODt vs TS for each country. The grey area around the line is the confidence interval (default=0.95) and can be removed with the additional call to stat_smooth of se = FALSE.
ggplot(ndat, aes(x=TS, y=CODt)) +
geom_point() +
scale_y_log10() +
facet_grid(~Country) +
stat_smooth(method = lm)
A linear model is not always the best fit. The method of calculating the smoothing function can be changed to other provided functions (such as loess, used below) or can be a custom formula. Note that I changed the confidence interval by modifying level=0.8. geoms can be made more transparent with the alpha parameter, which is set to 0.3 in the following code so that the emphasis is on the regression line rather than the points.
ggplot(ndat, aes(x=TS, y=CODt)) +
geom_point(alpha = 0.3) +
scale_y_log10() +
facet_grid(~Country) +
stat_smooth(method = loess, level = 0.8)
We are making a density plot for OTUs by Country. I have set alpha (transparency) to 0.3 so that we can see both countries on our plot.
ggplot(dat, aes(x=OTUs, fill=Country, alpha=0.3)) +
geom_density()
The first thing to notice is that everything is clumped at 0. This is because we have not filtered our data frame to remove all observations where OTUs are zero. Here we filter to have at least 2 OTUs. The other thing to notice is that there is a long tail where there will only be a few observations. It will be necessary to change the x-axis to see our data. This is done by setting ‘limits’ (lower and upper boundaries) on the x-axis with xlim(). Note that R gives us a warning that we are not viewing 158 of our 4212 rows. We can add a rug geom to see each value.
ggplot(dat[dat$OTU >=2,], aes(x=OTUs, fill=Country, alpha=0.3)) +
geom_density() +
geom_rug() +
xlim(0, 1000)
## Warning: Removed 158 rows containing non-finite values (stat_density).
Histograms instead count the number of observations you have in each ‘bin’ that you specify. The default binwidth is 30, which means your data will be divided into a new bin every 30 units along your x-axis. THIS HAS NOTHING TO DO WITH YOUR DATA!! CHANGE IT!! R will even warn you to change your binwidth.
ggplot(dat[dat$OTU >=2,], aes(x=OTUs, fill=Country, alpha=0.3)) +
geom_histogram() +
xlim(0,1000)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 158 rows containing non-finite values (stat_bin).
Instead of having the countries information stacked, we may want to see the data side by side. This can be done with the parameter position set to ‘dodge’. A rug geom can also be added to a histogram. Note that the binwidth has been changed and limits have also been set on the y-axis.
ggplot(dat[dat$OTU >=2,], aes(x=OTUs, fill=Country, alpha=0.3)) +
geom_histogram(binwidth = 50, position = "dodge") +
xlim(0,1000) +
ylim(0,150) +
geom_rug()
Challenge
Using the data filtered for OTUs greater than or equal to two, make a violin plot of the distribution of OTUs for each country. Use a log scale. Colour the violin plots by Country. Draw lines across the violin plot where the quantiles (25th, 50th, 75th) are for each plot. What do the widths of the plots represent?
Let’s create a bar plot of OTUs per country and use the filled in colour to represent Taxa.
ggplot(dat, aes(x=Country, y=OTUs, fill=Taxa)) +
geom_bar()
## Error: stat_count() must not be used with a y aesthetic.
This is a common error because the default for geom_bar is to use the y-axis for a count. To use it for a variable instead, we have to specify stat=“identity”.
ggplot(dat[dat$OTU >=2,], aes(x=Country, y=OTUs, fill=Taxa)) +
geom_bar(stat = "identity")
In the legend for the diagram above our factor levels for Taxa are in alphabetical order. However, an ordering that might be more useful would be an order that matches our data, for example, Taxa in descending order of OTUs. To do this, we can use the forcats package (included in tidyverse and already loaded). fct_reorder2() takes the factor we want ordered (Taxa) and orders it by the values given (Country, OTUs).
We will talk about customizing plots later in this lesson, but the last line of code is one way to remove the legend title (by making the legend title ‘blank’).
ggplot(dat[dat$OTU >=2,], aes(x=Country, y=OTUs, fill=fct_reorder2(Taxa, Country, OTUs))) +
geom_bar(stat = "identity") +
theme(legend.title = element_blank())
Now our highest abundance Taxa is the first value in our legend, and this matches the order of the Taxa in our bar graph. The legend order now has meaning. The white line in the Vietnam bar graph is a Taxa for which there was an OTU value in Tanzania but no data in Vietnam (since anything <=2 was filtered out of the data set).
Let’s take a second to look at what happens when stat does not equal ‘identity’. What is being counted instead of OTUs? Why does the colouring for Taxa look so different?
ggplot(dat, aes(x=Country, fill=Taxa)) +
geom_bar()
You can alternate between ‘stacked’ or ‘dodged’ (as we did with the histogram) for whether your bars are on top of each other or next to each other when splitting by a factor or categorical variable.
ggplot(ndat, aes(x=Depth, y=pH, fill = Country)) +
geom_bar(stat="identity", position = "dodge")
You can have your bars horizonal instead of vertical by using coord_flip().
ggplot(ndat, aes(x=Depth, y=pH, fill = Country)) +
geom_bar(stat="identity", position = "dodge") +
coord_flip()
In ggplot2, circular plots are related to bar graphs - they just have different coordinate systems. The default coordinate system is cartesian coordinates, and we need to switch to polar coordinates to make a circle. This is a Coxcomb plot - pH levels increase as you move up the outer rings, and depth increases as you move clockwise around the circle. Colour is still represented by country.
ggplot(ndat, aes(x=Depth, y=pH, fill = Country)) +
geom_bar(stat="identity", position = "dodge") +
coord_polar()
To make your classic venn diagram, use theta to specify what variable is going to be used to make up the angles (width of pie slices). To wrap to a full circle instead of having sections (as in the above Coxcomb plot), the width is set to one.
ggplot(dat, aes(x="", y=Country, fill = Country)) +
geom_bar(stat = "identity", width = 1) +
coord_polar(theta = "y")
To draw a line graph, we select geom_line().
ggplot(ndat, aes(x=Temp, y=pH)) + geom_line()
We can colour the lines for the different countries. Looking at our previous graph, what must have happened when these values were over the same range?
ggplot(ndat, aes(x=Temp, y=pH, colour = Country)) + geom_line()
To plot error bars, we need to give the geom, geom_errorbar(), and the interval over which we want to draw the bar. This interval is usually one standard deviation, one standard error or a confidence interval.
Luckily, we have learned how to calculate the mean and standard deviation with dplyr. Once we have that, all we need to do is add or subtract the standard deviation from our data points to get the bounds of the error bar.
geom_errorbar() takes these bounds passed to the parameters ymax and ymin. In this case, since there were no sample replicates the standard deviation is taken from the mean of all of the soil readings in a given country at a given depth.
errdat <- ndat %>% group_by(Country, Depth) %>% mutate(mean_pH = mean(pH)) %>% mutate(sd_pH= sd(pH))
ggplot(errdat, aes(x=Temp, y=pH, colour = Country)) +
geom_line() +
geom_errorbar(aes(ymin=pH-sd_pH, ymax = pH+sd_pH), width = 0.2, alpha = 0.4)
## Warning: Removed 4 rows containing missing values (geom_errorbar).
The first plots we made we scatterplots with 2 continuous variables. With one discrete variable and one categorical variable, we can make a stripplot. We use the point geom for both of these types of plots.
ggplot(dat, aes(x = Depth, y = OTUs)) + geom_point()
Again, we have a lot of values crushed near the x-axis. If we add log scaling to the y-axis, we get an error that this transformation created infinite values. This is because we have zeros in our data set.
ggplot(dat, aes(x = Depth, y = OTUs)) +
geom_point() +
scale_y_log10()
## Warning: Transformation introduced infinite values in continuous y-axis
Knowing this, we could ignore the warning, or add +1 to each OTU (negligable on a log scale).
ggplot(dat, aes(x = Depth, y = (OTUs+1))) +
geom_point() +
scale_y_log10()
In order to see the points a little better, we can ‘jitter’ them. Jittering spreads out our data points while keeping them in the same area of the plot so we can get an idea of density.
ggplot(dat, aes(x = Depth, y = (OTUs+1))) +
geom_point(position = "jitter") +
scale_y_log10() +
facet_wrap(~Country)
With bubble plots, we need to provide the size of our bubbles in a way that is proportional to our data, and provide a scale to map these to.
Let’s group our OTUs by Country and Taxa and look at the Taxa for the top 30 OTUs. As a refresher, try to do this using dplyr functions.
We are going to make our bubbles (which can be thought of as large points) with geom_jitter() so that our bubbles don’t overlap. Remember that we want to specify what we are plotting with aesthetics. We want the bubbles representing OTUs_per_Taxa, so we divide each value by pi when calling size.
bdat <- dat %>%
select(-Latrine_Number, -Depth) %>%
group_by(Taxa, Country) %>%
mutate(OTUs_per_Taxa = sum(OTUs)) %>%
select(-OTUs)
bdat <- unique(bdat) %>% arrange(desc(OTUs_per_Taxa)) %>% .[1:30,]
ggplot(bdat, aes(x = Country, y = OTUs_per_Taxa, fill=Taxa)) +
scale_y_log10() +
geom_jitter(aes(size = OTUs_per_Taxa/pi), pch = 21, show.legend = TRUE) +
guides(fill = FALSE)
However, it is hard to tell proportionally how different these sizes are. Mapping values to a scale gives us more of an idea of their relative sizes. This is done by giving a range of values to scale_size_continuous(). This range is the size you want your plotting symbols (bubbles) to be after transformation. You can use trial and error to see what range you like; it is not changing your data, but rather your perception of the data.
ggplot(bdat, aes(x = Country, y = OTUs_per_Taxa, fill=Taxa)) +
scale_y_log10() +
geom_jitter(aes(size = OTUs_per_Taxa/pi), pch = 21, show.legend = FALSE) +
scale_size_continuous(range=c(1,30)) +
guides(fill=FALSE)
####Labelling Data Points
ggplot2 provides 2 methods of labelling: geom_label() and geom_text(). Let’s look at the help menu to find out what the difference between them is.
We have to specify in with aesthetics what we are plotting for our label.
ggplot(bdat, aes(x = Country, y = OTUs_per_Taxa, fill=Taxa)) +
scale_y_log10() +
geom_jitter(aes(size = OTUs_per_Taxa/pi), pch = 21, show.legend = FALSE) +
scale_size_continuous(range=c(1,30)) +
geom_label(aes(label = Taxa), show.legend = FALSE)
This isn’t fantastic because our labels are overlapping. What parameters might you be able to use to move the labels? Try using them to get the labels to not overlap.
The text geom has an option to check for overlap of labels. geom_text() does not provide a background for the label, and it may be harder to tell which label belongs to each data point.
ggplot(bdat, aes(x = Country, y = OTUs_per_Taxa, fill=Taxa)) +
scale_y_log10() +
geom_jitter(aes(size = sqrt(OTUs_per_Taxa/pi)), pch = 21, show.legend = FALSE) +
scale_size_continuous(range=c(1,30)) +
geom_text(aes(label = Taxa), check_overlap = TRUE, show.legend = FALSE)
What happened here? Why don’t we have as many labels? Look in the help menu to explain this behaviour.
I don’t like futzing with label positions, so I went looking for a package that would do this for me. ggrepel will ‘repel’ your labels away from each other without getting rid of them. Let’s check it out with our bubble plot labels. Install and load ggrepel. The equivalent function we can use is geom_label_repel(). The force parameter allow you to modify how far you want your labels pushed away from each other.
Here, the box colour is being used to map to our data points but ggrepel can instead connect a line to data points by altering the value of segment.size. Variations on use can be found here: https://cran.r-project.org/web/packages/ggrepel/vignettes/ggrepel.html.
library(ggrepel)
ggplot(bdat, aes(x = Country, y = OTUs_per_Taxa, fill=Taxa)) +
scale_y_log10() +
geom_jitter(aes(size = sqrt(OTUs_per_Taxa/pi)), pch = 21, show.legend = FALSE) +
scale_size_continuous(range=c(1,30)) +
geom_label_repel(aes(label = Taxa), force = 2, show.legend = FALSE, segment.size = 0)
Boxplots are a great way to visualize summary statistics for your data. As a reminder, the thick line in the center of the box is the median. The upper and lower ends of the box are the first and third quartiles (or 25th and 75th percentiles) of your data. The whiskers extend to the largest value no further than 1.5*IQR (inter-quartile range - the distance between the first and third quartiles). Data beyond these whiskers are considered outliers and plotted as individual points. This is a quick way to see how comparable your samples or variables are.
We are going to use boxplots to see the distribution of OTUs per Taxa across all samples.
ggplot(dat, aes(x = Taxa, y = OTUs)) +
geom_boxplot()
While we are going to address customization and what theme elements are shortly, but I think showing you now how to rotate the x-axis labels is appropriate. Essentially we are taking the text on the x-axis and rotating it by 90 degrees.
ggplot(dat, aes(x = Taxa, y = OTUs)) +
geom_boxplot() +
theme(axis.text.x = element_text(angle=90))
We then justify the labels such that they align with the x-axis.
You may ask why this is a horizontal (hjust) justification, when it seems like moving the labels upwards towards the x-axis should be a vertical (vjust) justification. If you look in the help menu at element_text() you will see that the justification is carried out before the rotation. While we can specify the parameters of element_text() in any order, this does not change the order they are executed in the function.
ggplot(dat, aes(x = Taxa, y = OTUs)) +
geom_boxplot() +
theme(axis.text.x = element_text(angle=90, hjust = 1))
While it is clear that Clostridia is the most represented taxa, it is difficult to tell whether some other taxa have no representation, or if they are lowly represented. Transforming to a log scale on the y-axis will sort this out for us.
ggplot(dat, aes(x = Taxa, y = OTUs)) +
geom_boxplot() +
theme(axis.text.x = element_text(angle=90, hjust=1)) +
scale_y_log10()
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 2700 rows containing non-finite values (stat_boxplot).
We could also clean this plot up a bit by removing those taxa with 1 or fewer OTUs as well as the condition that a taxa must be present in more than 1 sample.
keep <- dat %>%
ungroup() %>%
filter(OTUs>=2) %>%
group_by(Taxa) %>%
summarize(n = n()) %>%
filter(n > 1) %>%
select(Taxa)
ggplot(dat[dat$Taxa %in% keep$Taxa,] , aes(x = Taxa, y = OTUs)) +
geom_boxplot() +
theme(axis.text.x = element_text(angle=90, hjust=1)) +
scale_y_log10()
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 1431 rows containing non-finite values (stat_boxplot).
If we facet by country we can see that Acidobacteria_Gp1, for example, is only found in Vietnam and not Tanzania.
ggplot(dat[dat$Taxa %in% keep$Taxa,] , aes(x = Taxa, y = OTUs)) +
geom_boxplot() +
theme(axis.text.x = element_text(angle=90, hjust=1)) +
scale_y_log10() + facet_grid(~Country)
For now, let’s just keep the Taxa that are common to both countries.
keep <- dat %>%
ungroup() %>%
filter(OTUs>=2) %>%
group_by(Country, Taxa) %>%
summarize(n = n()) %>%
filter(n > 1)
keep <- unique(keep[duplicated(keep$Taxa),"Taxa"])
ggplot(dat[dat$Taxa %in% keep$Taxa,] , aes(x = Taxa, y = OTUs)) +
geom_boxplot() +
theme(axis.text.x = element_text(angle=90, hjust=1)) +
scale_y_log10() +
facet_grid(~Country)
Even though boxplots give us summary statistics on our data, it is useful to be able to see where our data points are. Adding the data using geom_point() adds our data in the form of a stripplot to our boxplots. It would be nice to see how many points we have at each OTU value, but there are a lot of points overlapping here.
ggplot(dat[(dat$Taxa=="Clostridia" | dat$Taxa == "Gammaproteobacteria" | dat$Taxa == "Unknown" | dat$Taxa == "Bacilli") & dat$Country == "T",] , aes(x = Taxa, y = OTUs)) +
geom_boxplot() +
geom_point() +
theme(axis.text.x = element_text(angle=90, hjust=1)) +
scale_y_log10()
Jittering the points gives a general idea, but is a bit too widely dispersed to give a good sense of the data.
ggplot(dat[(dat$Taxa=="Clostridia" | dat$Taxa == "Gammaproteobacteria" | dat$Taxa == "Unknown" | dat$Taxa == "Bacilli") & dat$Country == "T",] , aes(x = Taxa, y = OTUs)) +
geom_boxplot() +
geom_jitter() +
theme(axis.text.x = element_text(angle=90, hjust=1)) +
scale_y_log10()
A beeswarm plot places data points that were overlapping next to each other, so we can get a better picture of the distribution of points. We simply overlay the points with geom_beeswarm().
library(ggbeeswarm)
ggplot(dat[(dat$Taxa=="Clostridia" | dat$Taxa == "Gammaproteobacteria" | dat$Taxa == "Unknown" | dat$Taxa == "Bacilli") & dat$Country == "T",] , aes(x = Taxa, y = OTUs)) +
geom_boxplot() +
geom_beeswarm() +
theme(axis.text.x = element_text(angle=90, hjust=1)) +
scale_y_log10()
Increasing the spacing between data points (increasing ‘cex’) can make this distribution a bit clearer.
ggplot(dat[(dat$Taxa=="Clostridia" | dat$Taxa == "Gammaproteobacteria" | dat$Taxa == "Unknown" | dat$Taxa == "Bacilli") & dat$Country == "T",] , aes(x = Taxa, y = OTUs)) +
geom_boxplot() +
geom_beeswarm(cex = 2.2) +
theme(axis.text.x = element_text(angle=90, hjust=1)) +
scale_y_log10()
Using geom_quasirandom() gives the empirical distribution of the stripplot to avoid overplotting.
ggplot(dat[(dat$Taxa=="Clostridia" | dat$Taxa == "Gammaproteobacteria" | dat$Taxa == "Unknown" | dat$Taxa == "Bacilli") & dat$Country == "T",] , aes(x = Taxa, y = OTUs)) +
geom_boxplot(alpha=0.8) +
geom_quasirandom(varwidth = TRUE) +
theme(axis.text.x = element_text(angle=90, hjust=1)) +
scale_y_log10()
Other spacing and distribution options are available at https://github.com/eclarke/ggbeeswarm.
There are 3 main types of colour palettes. 1. Sequential - implies an order to your data - ie. light to dark implies low values to high values. 2. Diverging - low and high values are extremes, and the middle values are important - still goes from light to dark, but 3 colours mainly used. 3. Qualitative - there is no quantitative relationship between colours. This is usually used for categorical data.
Which of these is ggplot2’s default color palette?
Many colour palettes now exist. I’ll showcase a couple that work nicely with ggplot2. These packages also have colorblind friendly options. RColorBrewer has options for these 3 types of palettes, which you can see with display.brewer.all(). With a smaller dataset, we could make a call in ggplot directly to scale_fill_brewer(), which just requires a choice of one of RColorBrewer’s palettes, such as “Spectral”. However, we have 24 Taxa and these palettes have 8-12 colours, so we have to get creative.
library(RColorBrewer)
display.brewer.all()
p + scale_fill_brewer(palette = "Spectral")
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 607 rows containing non-finite values (stat_boxplot).
## Warning in RColorBrewer::brewer.pal(n, pal): n too large, allowed maximum for palette Spectral is 11
## Returning the palette you asked for with that many colors
I have simply taken the 2 qualitative palettes that each have a length of 12, put them into one palette, and made sure my values were unique. This can then be passed to ggplot via scale_fill_manual().
palette1 <- brewer.pal(12, "Paired")
palette2 <- brewer.pal(12, "Set3")
length(unique(c(palette, palette2)))
## [1] 13
custom <- c(palette1, palette2)
p + scale_fill_manual(values = custom)
You can always choose a vector of your own colors using this ‘R color cheatsheet’(https://www.nceas.ucsb.edu/~frazier/RSpatialGuides/colorPaletteCheatsheet.pdf).
Names of colours as well as colour codes are accepted.
scale_fill_manual(values=c("purple", "cornflowerblue", "grey", "yellow", "orange", "FF0000"))
The viridis package also has some nice color palettes (https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html). I think they might all be diverging palettes (qualitative is best for our Taxa), but I will showcase a couple here.
library(viridis)
## Loading required package: viridisLite
p + scale_fill_viridis(discrete = TRUE)
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 607 rows containing non-finite values (stat_boxplot).
p + scale_fill_viridis(discrete = TRUE, option = "plasma")
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 607 rows containing non-finite values (stat_boxplot).
RSkittleBrewer is another option for funky colour palettes. ggsci has a variety of color palettes inspired by different scientific journals as well as television shows (https://cran.r-project.org/web/packages/ggsci/vignettes/ggsci.html).
There are many fantastic R packages to analyze and visualize your data. As a group, we are likely working in a variety of specialized areas. The plots we have made so far today should be useful for data exploration for many different kinds of data. In the next section we are going to preview some more complex visualization types, but since these take more time to go through and not everyone may be interested in network diagrams, time-series analysis, or geospatial data, we will not be plotting all of these together. We will, however do an interactive heatmap and also an upset plot. If you are interested in a tutorial on one of these other visualization types, please indicate that in the comments area of the lesson survey (https://www.surveymonkey.com/r/VNQZ3KS).
Plotly: - https://plot.ly/r/
ggvis: - http://ggvis.rstudio.com/interactivity.html
Heatmaps: - https://github.com/talgalili/heatmaply
Interactive time-series data: - https://rstudio.github.io/dygraphs/
One common way to look for outliers or trends in data is to calculate the correlation coefficients between samples and make a correlation plot. Since the cor() function expects samples to be in columns and observations in rows, this is an example of where you would use ‘wide’ data. The default correlation coefficient is the Pearson correlation, but others (Kendall, Spearman) can be specified. cor() takes in a matrix (numeric data only).
So we read in the wide version of our OTU table, keeping the Taxa as rownames since the matrix can only contain numeric data. Samples (the Country, Latrine Number and Depth combination) are in columns. There are a number of ways to handle missing data, one use is shown below.
The output correlation matrix must be converted into long format before we can make our plot.
cdat <- read.csv("data/SPE_pitlatrine.csv", row.names = 1)
cdat <- cor(cdat, method = "pearson", use = "complete.obs")
library(reshape2)
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
melt <-melt(cdat)
A heatmap can be made with ggplot by using the tile geom. The fill is now the correlation coefficient, which is a continuous variable (hence the gradient). I have changed the title of the legend from ‘value’ to ‘Pearson Correlation’ by using scale_fill_continuous(). The ‘’ included in the name is a line break. A fantastic resource for modifying legends is http://www.cookbook-r.com/Graphs/Legends_(ggplot2)/.
ggplot(melt, aes(Var1, Var2, fill = value)) +
geom_tile() +
xlab(NULL) +
ylab(NULL) +
scale_fill_continuous(name = "Pearson\nCorrelation") +
theme(axis.text.x = element_text(angle =90, hjust=1))
If this looks a little jarring to you compared to ‘normal’ heatmaps, it is probably because the data has not been ordered yet. This can be done by calculating the distance between samples (ie. euclidean) and then clustering them in a hierarchical way (ie. complete linkage).
dist() operates on the correlation matrix it was output from corr and hclust() works on the output of dist(). Once we reorder our correlation matrix, we then convert it to long format for plotting.
To have a different colour scheme that for example, goes from red to yellow, you can use scale_fill_gradient() and specify what you want the diverging colours to be. Note that your legend title now derives from this gradient.
distance <- dist(cdat)
clust <- hclust(distance)
cdat <- cdat[clust$order, clust$order]
melt <-melt(cdat)
ggplot(melt, aes(Var1, Var2, fill = value)) + geom_tile() +
scale_fill_gradient(name = "Pearson\nCorrelation", low = "red", high = "yellow") +
xlab(NULL) +
ylab(NULL)+
theme(axis.text.x = element_text(angle =90, hjust=1, size =5), legend.title = element_text(size=10), axis.text.y = element_text(size=5))
We can now see that there is a patch of dark red samples with low correlation. It is difficult to see which sites these are as the names are a bit small.
Enter the interactive heatmap! Plotly allows for interactive plots in a variety of languages, and allows the creation of sharable links (if you are interested sign up for a free account to get API credentials https://plot.ly/r/getting-started/). They have created a package, which, luckily for us, interfaces seamlessly with ggplot2. We only need to save our plot and call the function ggplotly() to have an interactive graph!
The response of plotly to manipulations other than hovering can be a bit slow if not using the API and may freeze/crash R, so I recommend saving your code before trying it.
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
p <- ggplot(melt, aes(Var1, Var2, fill = value)) + geom_tile() +
scale_fill_gradient(name = "Pearson\nCorrelation", low = "red", high = "yellow") +
xlab(NULL) +
ylab(NULL)+
theme(axis.text.x = element_text(angle =90, hjust=1), legend.title = element_text(size=10))
ggplotly(p)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
Hovering will highlight the variables and values relating to that specific data point, and areas can be zoomed in on. We can now see the names of the low correlation samples. While they are all from Tanzania, they do not appear to all be from the same Latrine_Number or Depth.
ggvis is a Bioconductor package allowing interactive graphics in R (http://ggvis.rstudio.com/). It is founded on the grammar of graphics and the syntax is similar, but not identical to ggplot2. It is good for interactive scatterplots, bar graphs, line graphs and histograms (although not heatmaps at this point in time). The nature of the interactiveness is highly customizable: you can hover, add slider bars, checkboxes or radiobuttons (it uses Shiny’s reactive programming for anyone who might be familiar).
In this example I have just taken a scatterplot that we made earlier, and made it possible to hover over a point and see the underlying data. Don’t worry about the function and syntax too much. I just wanted to show you another option for interactive visualization that hopefully isn’t too intimidating.
library(ggvis)
##
## Attaching package: 'ggvis'
## The following objects are masked from 'package:plotly':
##
## add_data, hide_legend
## The following object is masked from 'package:ggplot2':
##
## resolution
p <- ggvis(ndat, x= ~TS, y= ~CODt)
layer_points(p) %>% add_tooltip(function(x) c(paste0("TS=", x$TS, ", CODT=", x$CODt)))
ggplot2 has its own function for saving its graphics: ggsave().
Explain graphics device……….
You can send the plot object to the graphics device and then save that image.
With ggsave you can minimally input the filename you would like to have, and the path to your file (you can instead combine the path and filename in the filename).
ggplot(melt, aes(Var1, Var2, fill = value)) + geom_tile() +
scale_fill_gradient(name = "Pearson\nCorrelation", low = "red", high = "yellow") +
xlab(NULL) +
ylab(NULL)+
theme(axis.text.x = element_text(angle =90, hjust=1, size =5), legend.title = element_text(size=10), axis.text.y = element_text(size=5))
ggsave("heatmap.png", path = "img")
## Saving 7 x 5 in image
However, in some cases you want to tailor your output. You can specify the width, height and units of your image, or you can apply a scaling factor. You can also specify the plot object you want to save instead of whatever was on your graphics device last using the ‘plot’ parameter. ggsave() can save in a variety of file extensions including pdf, jpeg, tiff, bmp, svg or png.
hmap <- ggplot(melt, aes(Var1, Var2, fill = value)) + geom_tile() +
scale_fill_gradient(name = "Pearson\nCorrelation", low = "red", high = "yellow") +
xlab(NULL) +
ylab(NULL)+
theme(axis.text.x = element_text(angle =90, hjust=1, size =5), legend.title = element_text(size=10), axis.text.y = element_text(size=5))
ggsave("heatmap.pdf", plot = hmap, path = "img", scale = 2, width = 150, height = 110, units = "mm")
There are a variety of methods to mix multiple graphs on the same page, however ggplot does not work well with all of them. I am going to work with a package based that uses gridExtra(which allows us to arrange plots) but works well with ggplot called ggpubr (which allows us to align the axes of our plots). For a demonstration, we are going to take 3 plots that we made earlier (a boxplot, a histogram, and a bubble plot), save them as objects, and then arrange and align them in the same figure. (http://www.sthda.com/english/rpkgs/ggpubr/)
There is a function ggarrange() which takes your plots, their labels, and how they are arranged in rows and columns. To start, if we just have our boxplot and bubble plot, and we wanted them to be side by side, if you picture each plot as a square in a grid, we would have two columns ncol = 2 and 1 row
library(ggpubr)
## Loading required package: magrittr
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
##
## set_names
## The following object is masked from 'package:tidyr':
##
## extract
hist <- ggplot(dat[dat$OTU >=2,], aes(x=OTUs, fill=Country, alpha=0.3)) +
geom_histogram(binwidth = 50, position = "dodge") +
xlim(0,1000) +
ylim(0,150) +
geom_rug()
bub <- ggplot(bdat, aes(x = Country, y = OTUs_per_Taxa, fill=Taxa)) +
scale_y_log10() +
geom_jitter(aes(size = sqrt(OTUs_per_Taxa/pi)), pch = 21, show.legend = FALSE) +
scale_size_continuous(range=c(1,30)) +
geom_text(aes(label = Taxa), check_overlap = TRUE, show.legend = FALSE)
ggarrange(bub, hist,
labels = c("A", "B"),
ncol = 2, nrow = 1)
While our grid areas are the same size, our plot backgrounds are not. Let’s adjust the legend so that it is in the top right corner of the plot, and get remove the white background. We do not need the alpha legend. Do you remember how to get rid of it?
hist <- ggplot(dat[dat$OTU >=2,], aes(x=OTUs, fill=Country, alpha=0.3)) +
geom_histogram(binwidth = 50, position = "dodge") +
xlim(0,1000) +
ylim(0,150) +
geom_rug() +
guides(alpha = FALSE) +
theme(legend.justification=c(1,1), legend.position=c(1,1), legend.background = element_blank())
ggarrange(bub, hist,
labels = c("A", "B"),
ncol = 2, nrow = 1)
Imagine a square with 4 boxes. We are going to put our boxplot in the left top and bottom boxes, the histogram in the top right box, and the bubble plot in the bottom right box. To do this, we are arranging 2 columns (one with the boxplot and one with the histogram + boxplot) and we are arranging 2 rows (one with the histogram and one with the boxplot).
box <- ggplot(dat[(dat$Taxa=="Clostridia" | dat$Taxa == "Gammaproteobacteria" | dat$Taxa == "Unknown" | dat$Taxa == "Bacilli") & dat$Country == "T",] , aes(x = Taxa, y = OTUs)) +
geom_boxplot() +
geom_point() +
theme(axis.text.x = element_text(angle=90, hjust=1)) +
scale_y_log10()
ggarrange(box, ggarrange(hist, bub,
labels = c("B", "C"),
nrow = 2),
ncol = 2, labels = "A")
The y-axis lines for the histogram and boxplot are not aligned, but this can be fixed with a call to
align="v".
ggarrange(box, ggarrange(hist, bub,
labels = c("B", "C"),
nrow = 2, align = "v"),
ncol = 2, labels = "A")
To make sure all axis titles are the same size (B and C look a little off to me), we can specify the font we want changed and the size we want to change it to. I am also going to make the legend title size the same. I want to make the bubble plot font a little smaller as well. Let’s look at the font function. Can I do that? What might be another option?
bub <- ggplot(bdat, aes(x = Country, y = OTUs_per_Taxa, fill=Taxa)) +
scale_y_log10() +
geom_jitter(aes(size = sqrt(OTUs_per_Taxa/pi)), pch = 21, show.legend = FALSE) +
scale_size_continuous(range=c(1,30)) +
geom_text(aes(label = Taxa, size=16), check_overlap = TRUE, show.legend = FALSE)
ggarrange(box + font("axis.title", size=9), ggarrange(hist + font("axis.title", size=9) + font("legend.title", size=9), bub + font("axis.title", size=9),
labels = c("B", "C"),
nrow = 2, align = "v"),
ncol = 2, labels = "A")
Challenge
Using the code available here for a line graph we made earlier in the lesson, make a combined figure. Imagine a square with 4 boxes. We are going to put our line graph in the top right and left boxes, the histogram in the bottom left box, and the bubble plot in the bottom right box. Make sure the legend and axis titles are the same size. Change the legend text for the line graph to be smaller than the legend title.
lin <- ggplot(errdat, aes(x=Temp, y=pH, colour = Country)) +
geom_line() +
geom_errorbar(aes(ymin=pH-sd_pH, ymax = pH+sd_pH), width = .2, alpha = 0.4)
visNetwork (based on igraph) https://datastorm-open.github.io/visNetwork/edges.html
Upset plots are an alternative to venn diagrams that show the intersection of sets, as well as the size of the sets. Additionally, venn diagrams can be difficult to interpret for greater than 2 or 3 sets. This is a real life figure from BMC Bioinformatics. Sure it looks pretty, but what does the number 24 represent in this picture in terms of A, B, C, D, and E?
Let’s see how UpSet plots work practically. The question that we are asking is how many sites are each Taxa represented at and what is the overlap with other Taxa? The data that we have is an OTU table. For this purpose we simply want to know whether a given Taxa was present or absent in the sample, and then we can form intersections based on this information. So, for each OTU value that is not 0, we replace it with a 1 instead. Then all we have to do is to enter our data set, the number of sets we are inputting, if we want to order the results (in this case by frequency), and how many intersections we want to show. I have picked 8 different Taxa, which would make a crazy venn diagram with a lot of zeros in it since a couple of these Taxa are rare. Here, I will show 15 intersections - we know the remaining intersections would be zero since this is ordered by frequency.
library(UpSetR)
subdat <- read.csv("data/SPE_pitlatrine.csv", stringsAsFactors = FALSE)
test <- subdat %>%
rownames_to_column %>%
gather(var, value, -rowname) %>%
spread(rowname, value)
colnames(test) <- test[30,]
test <- test[-30,]
test[,-1] <- apply(test[,-1], 2, function(x) ifelse(as.numeric(x)!=0, 1, 0))
test <- test[, c(1,4,9,17,22,19,20,33,35)]
upset(test, nsets = 8, empty.intersections = TRUE, order.by = "freq", nintersects = 15, main.bar.color = "black", sets.bar.color = "#56B4E9" )
Let’s look at our greatest Intersection Size (equal to 28). This means that 28 of our 81 samples have Clostridia, Alphaproteobacteria, and Deinococci present in the same sample WITHOUT Chloroflexi, both Acidobacteria, Cyanobacteria or Lentisphaeria also being present. We can see from the Set Size that Clostridia and Alphaproteobacteria are present in almost all samples, and Deinococci is present in greater than 3/4 of the samples. You might expect this overlap to be higher, but remember that this is without the other Taxa and moving along our interesection sizes and dot matrix, we can see that other intersections include these bacteria. Some quick mental math shows that 63 samples have these 3 bacteria present.
Static Maps: - https://bhaskarvk.github.io/user2017.geodataviz/notebooks/02-Static-Maps.nb.html
Interactive Maps: - https://bhaskarvk.github.io/user2017.geodataviz/notebooks/03-Interactive-Maps.nb.html
treeman: - https://bmcresnotes.biomedcentral.com/articles/10.1186/s13104-016-2340-8
metacoder: - https://github.com/grunwaldlab/metacoder
phyloseq: - https://joey711.github.io/phyloseq/index.html
ggbio: - http://www.bioconductor.org/packages/2.11/bioc/vignettes/ggbio/inst/doc/ggbio.pdf
GenVisR: - https://bioconductor.org/packages/release/bioc/vignettes/GenVisR/inst/doc/GenVisR_intro.html
GenomeGraphs: - https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-10-2
Challenge
Install and load the gapminder package. Save the gapminder data into an object using dat <- gapminder. Plot the following:
Resources:
Wickham, Hadley. (2010). A Layered Grammar of Graphics. Journal of Computational and Statistical Graphics.
Wilkinson, L. (2005), The Grammar of Graphics (2nd ed.). Statistics and Computing, New York: Springer. [14, 18]
http://www.cookbook-r.com/Graphs/
https://github.com/jennybc/ggplot2-tutorial
Tufte, Edward R. The Visual Display of Quantitative Information.
http://stcorp.nl/R_course/tutorial_ggplot2.html
http://ggplot2.tidyverse.org/reference/theme.html http://joeystanley.com/blog/custom-themes-in-ggplot2 https://github.com/jrnold/ggthemes https://www.nceas.ucsb.edu/~frazier/RSpatialGuides/colorPaletteCheatsheet.pdf https://cran.r-project.org/web/packages/ggrepel/vignettes/ggrepel.html http://www.cookbook-r.com/Graphs/Legends_(ggplot2)/ https://github.com/eclarke/ggbeeswarm https://cran.r-project.org/web/packages/ggsci/vignettes/ggsci.html http://www.sthda.com/english/rpkgs/ggpubr/ https://rpubs.com/drsong/9575 http://elpub.bib.uni-wuppertal.de/edocs/dokumente/fbb/wirtschaftswissenschaft/sdp/sdp15/sdp15006.pdf http://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/81-ggplot2-easy-way-to-mix-multiple-graphs-on-the-same-page/
Your feedback is essential to help the next cohort of trainees. Please take a minute to complete the following short survey: https://www.surveymonkey.com/r/VNQZ3KS
Thanks for coming!!!